Transect

Groupby and transform allow me to combine rows into a single 'transect' row. Or, use a multiIndex, a hierarchical index, so I can target specific cells using id and type. The index item for a multiIndex is a TUPLE.


In [1]:
%matplotlib inline
import sys
import numpy as np
import pandas as pd
import json
import matplotlib.pyplot as plt
from io import StringIO
print(sys.version)
print("Pandas:", pd.__version__)


3.6.0 |Anaconda custom (64-bit)| (default, Dec 23 2016, 11:57:41) [MSC v.1900 64 bit (AMD64)]
Pandas: 0.19.2

In [2]:
df = pd.read_csv('C:/Users/Peter/Documents/atlas/atlasdata/obs_types/transect.csv', parse_dates=['date'])
df = df.astype(dtype='str')# we don't need numbers in this dataset.
df=df.replace('nan','')
#this turns dates into strings with the proper format for JSON:
#df['date'] = df['date'].dt.strftime('%Y-%m-%d')

In [3]:
df.type = df.type.str.replace('\*remonitoring notes','transect')
df.type = df.type.str.replace('\*plot summary','transect')

shift data to correct column

using loc for assignment: df.loc[destination condition, column] = df.loc[source]


In [4]:
df.loc[df.type =='map',['mapPhoto']]=df['url'] #moving cell values to correct column

In [5]:
df.loc[df.type.str.contains('lineminus'),['miscPhoto']]=df['url']
df.loc[df.type.str.contains('lineplus'),['miscPhoto']]=df['url']
df.loc[df.type.str.contains('misc'),['miscPhoto']]=df['url']

In [40]:
#now to deal with type='photo'
photos = df[df.type=='photo']
nonphotos = df[df.type != 'photo'] #we can concatenate these later
grouped = photos.groupby(['id','date'])

In [42]:
photos.shape


Out[42]:
(290, 25)

In [39]:
values=grouped.groups.values()
for value in values:
    photos.loc[value[2],['type']] = 'misc'
    #photos.loc[value[1],['type']] = 'linephoto2'


---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-39-6770c71c3dcc> in <module>()
      1 values=groups.groups.values()
      2 for value in values:
----> 3     photos.loc[value[2],['type']] = 'misc'
      4     #photos.loc[value[1],['type']] = 'linephoto2'

C:\ProgramData\Anaconda3\lib\site-packages\pandas\indexes\base.py in __getitem__(self, key)
   1421 
   1422         if is_scalar(key):
-> 1423             return getitem(key)
   1424 
   1425         if isinstance(key, slice):

IndexError: index 2 is out of bounds for axis 0 with size 2

In [37]:
photos.loc[photos.type=='linephoto1']


Out[37]:
group id type site_description general_observations mapPhoto linephoto1 linephoto2 miscPhoto label1 ... label4 value4 label5 value5 date observers url note obs_type newurl
58 AVSP1 AVSP1 linephoto1 ... 2011-05-01 http://soilcarboncoalition.org/data/avsp/may1-... view of area transect
210 BURR1 BURR1 linephoto1 ... 2011-01-05 http://soilcarboncoalition.org/data/burr/jan 2... One of the three wire exclosures at this site. transect
244 CFAR1 CFAR1 linephoto1 ... 2011-10-26 http://soilcarboncoalition.org/data/cfar/Oct28... looking south along transect transect
275 CHRL1 CHRL1 linephoto1 ... 2013-07-10 http://soilcarboncoalition.org/data/chrl/jul10... west along transect transect
301 CLAC1 CLAC1 linephoto1 ... 2011-10-24 http://soilcarboncoalition.org/data/clac/Oct28... east along transect transect
306 CMEA1 CMEA1 linephoto1 ... 2012-01-05 http://soilcarboncoalition.org/data/cmea/Jan5a... NE along transect transect
311 CMEA2 CMEA2 linephoto1 ... 2012-01-05 http://soilcarboncoalition.org/data/cmea/Jan5-... NE along transect transect
334 CWFA1 CWFA1 linephoto1 ... 2013-07-23 http://soilcarboncoalition.org/data/cwfa/cwf1 ... east along transect transect
344 DAVI1 DAVI1 linephoto1 ... 2012-12-06 http://soilcarboncoalition.org/data/davi/19Dec... northwest along transect transect
383 DIXN1 DIXN1 linephoto1 ... 2011-12-19 http://soilcarboncoalition.org/data/dixn/Dec20... north along transect; line of sight is to powe... transect
391 DIXN2 DIXN2 linephoto1 ... 2011-12-27 http://soilcarboncoalition.org/data/dixn/Dec31... NW along transect transect
400 DIXN3 DIXN3 linephoto1 ... 2011-12-27 http://soilcarboncoalition.org/data/dixn/Dec31... north along transect transect
408 DIXN4 DIXN4 linephoto1 ... 2011-12-28 http://soilcarboncoalition.org/data/dixn/Dec31... south along transect transect
413 DIXN5 DIXN5 linephoto1 ... 2012-01-28 http://soilcarboncoalition.org/data/dixn/Jan29... looking north along tape transect
417 DIXN6 DIXN6 linephoto1 ... 2012-01-28 http://soilcarboncoalition.org/data/dixn/Jan29... north along tape showing plot ctr transect
421 DIXN7 DIXN7 linephoto1 ... 2012-01-31 http://soilcarboncoalition.org/data/dixn/Feb2-... north along tape transect
425 DIXN8 DIXN8 linephoto1 ... 2012-02-01 http://soilcarboncoalition.org/data/dixn/Feb2-... east along tape transect
430 DIXN9 DIXN9 linephoto1 ... 2012-02-06 http://soilcarboncoalition.org/data/dixn/Feb6-... NE along transect transect
436 DJAY1 DJAY1 linephoto1 ... 2011-12-21 http://soilcarboncoalition.org/data/djay/Dec20... east along transect transect
441 DPEN1 DPEN1 linephoto1 ... 2012-01-12 http://soilcarboncoalition.org/data/dpen/Jan20... SW along transect transect
471 EBFA1 EBFA1 linephoto1 ... 2013-08-18 http://soilcarboncoalition.org/data/ebfa/ebf1 ... east along transect transect
479 ESCU1 ESCU1 linephoto1 ... 2012-12-04 http://soilcarboncoalition.org/data/escu/4Dec2... west along transect transect
485 FAIR1 FAIR1 linephoto1 ... 2013-06-06 http://soilcarboncoalition.org/data/fair/june6... SW along transect transect
500 FFAI1 FFAI1 linephoto1 ... 2011-10-31 http://soilcarboncoalition.org/data/ffai/Nov5-... looking east along transect to cell tower transect
505 FFAI2 FFAI2 linephoto1 ... 2011-10-31 http://soilcarboncoalition.org/data/ffai/Nov5-... looking east from photo pt along transect to c... transect
510 FLLP1 FLLP1 linephoto1 ... 2012-07-01 http://soilcarboncoalition.org/data/fllp/2July... along transect, east transect
515 FLLP2 FLLP2 linephoto1 ... 2012-07-01 http://soilcarboncoalition.org/data/fllp/2July... west along transect transect
541 FRNR1 FRNR1 linephoto1 ... 2011-05-03 http://soilcarboncoalition.org/data/fllp/fernr... along transect from photo point, looking south transect
554 FRTZ1 FRTZ1 linephoto1 ... 2011-09-15 http://soilcarboncoalition.org/data/frtz/Sept1... north along transect transect
559 FRTZ2 FRTZ2 linephoto1 ... 2011-09-15 http://soilcarboncoalition.org/data/frtz/Sept1... along transect, NW transect
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1396 SNED1 SNED1 linephoto1 ... 2012-01-04 http://soilcarboncoalition.org/data/sned/Jan4-... south along transect transect
1403 SNED2 SNED2 linephoto1 ... 2012-01-04 http://soilcarboncoalition.org/data/sned/Jan4-... north along transect transect
1445 SWRC1 SWRC1 linephoto1 ... 2013-08-12 http://soilcarboncoalition.org/data/swrc/swrc1... SW along transect transect
1450 TFIR1 TFIR1 linephoto1 ... 2012-10-26 http://soilcarboncoalition.org/data/tfir/Oct-2... looking south along transect transect
1455 TGER1 TGER1 linephoto1 ... 2011-09-13 http://soilcarboncoalition.org/data/tger/Sept1... north along transect toward terrace transect
1460 TGER2 TGER2 linephoto1 ... 2011-09-13 http://soilcarboncoalition.org/data/tger/Sept1... east along transect transect
1469 THOM1 THOM1 linephoto1 ... 2011-12-29 http://soilcarboncoalition.org/data/thom/Dec31... SW along transect transect
1482 TOMK1 TOMK1 linephoto1 ... 2013-05-18 http://soilcarboncoalition.org/data/tomk/may19... west along transect transect
1492 TOMK2 TOMK2 linephoto1 ... 2013-05-18 http://soilcarboncoalition.org/data/tomk/may19... southwest along transect transect
1501 TOMK3 TOMK3 linephoto1 ... 2013-05-20 http://soilcarboncoalition.org/data/tomk/may20... west along transect transect
1506 TOMK4 TOMK4 linephoto1 ... 2013-05-21 http://soilcarboncoalition.org/data/tomk/may22... south along transect from photo point transect
1512 TONA1 TONA1 linephoto1 ... 2011-10-09 http://soilcarboncoalition.org/data/sgor/tonat... along transect, looking north transect
1529 TOTO2 TOTO2 linephoto1 ... 2013-05-15 http://soilcarboncoalition.org/data/toto/may19... NE along transect transect
1536 TOTO3 TOTO3 linephoto1 ... 2013-05-15 http://soilcarboncoalition.org/data/toto/may19... SW along transect transect
1543 TSPG1 TSPG1 linephoto1 ... 2011-10-21 http://soilcarboncoalition.org/data/tspg/Oct22... along transect, looking south transect
1548 TTRA1 TTRA1 linephoto1 ... 2012-01-24 http://soilcarboncoalition.org/data/ttra/Jan29... north along transect transect
1553 TTRA2 TTRA2 linephoto1 ... 2012-01-24 http://soilcarboncoalition.org/data/ttra/Jan29... south along transect transect
1558 TWTR1 TWTR1 linephoto1 ... 2011-08-01 http://soilcarboncoalition.org/data/twtr/Aug1-... looking west from photo pt (bucket lid at 50') transect
1563 TWTR2 TWTR2 linephoto1 ... 2011-08-02 http://soilcarboncoalition.org/data/twtr/Aug2-... transect east from 130' transect
1568 TWTR3 TWTR3 linephoto1 ... 2011-08-02 http://soilcarboncoalition.org/data/twtr/Aug2-... along transect NW. Note how contour of ridgeli... transect
1573 UORG1 UORG1 linephoto1 ... 2012-01-15 http://soilcarboncoalition.org/data/uorg/Jan20... west along transect transect
1578 UORG2 UORG2 linephoto1 ... 2012-01-15 http://soilcarboncoalition.org/data/uorg/Jan20... NW along transect transect
1583 VPHE1 VPHE1 linephoto1 ... 2012-03-17 http://soilcarboncoalition.org/data/vphe/Mar18... SSW along transect transect
1599 WDAV1 WDAV1 linephoto1 ... 2011-12-23 http://soilcarboncoalition.org/data/wdav/Dec20... west along transect transect
1604 WEFA1 WEFA1 linephoto1 ... 2011-09-19 http://soilcarboncoalition.org/data/wefa/Sept1... north along transect transect
1609 WFOR1 WFOR1 linephoto1 ... 2011-12-21 http://soilcarboncoalition.org/data/wfor/Dec20... looking at anchor pt through brush transect
1615 WGNR1 WGNR1 linephoto1 ... 2011-11-01 http://soilcarboncoalition.org/data/wgnr/Nov5-... east along transect transect
1623 WORK1 WORK1 linephoto1 ... 2011-01-17 http://soilcarboncoalition.org/data/work/jan 2... looking ESE along transect from photo point transect
1631 WORK2 WORK2 linephoto1 ... 2011-01-18 http://soilcarboncoalition.org/data/work/jan 2... looking NNE along transect from photo pt transect
1639 WORK3 WORK3 linephoto1 ... 2011-01-18 http://soilcarboncoalition.org/data/work/jan 2... looking WSW along transect from photo point transect

125 rows × 25 columns


In [13]:
for name, group in grouped:
    print(grouped[name])


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-13-d2ef52c48240> in <module>()
      1 for name, group in grouped:
----> 2     print(grouped[name])

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\base.py in __getitem__(self, key)
    335                 bad_keys = list(set(key).difference(self.obj.columns))
    336                 raise KeyError("Columns not found: %s"
--> 337                                % str(bad_keys)[1:-1])
    338             return self._gotitem(list(key), ndim=2)
    339 

KeyError: "Columns not found: '2011-05-01', 'AVSP1'"

In [58]:
photos = df[df.type == 'photo']
photos.set_index(['id','date'],inplace=True)
photos.index[1]


Out[58]:
('AVSP1', '2011-05-01')

In [40]:
photos=df[df.type=='photo']
photos.groupby(['id','date']).count()


Out[40]:
group type site_description general_observations mapPhoto linephoto1 linephoto2 miscPhoto label1 value1 ... value3 label4 value4 label5 value5 observers url note obs_type newurl
id date
AVSP1 2011-05-01 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
BURR1 2011-01-05 3 3 3 3 3 3 3 3 3 3 ... 3 3 3 3 3 3 3 3 3 3
CFAR1 2011-10-26 3 3 3 3 3 3 3 3 3 3 ... 3 3 3 3 3 3 3 3 3 3
CHRL1 2013-07-10 3 3 3 3 3 3 3 3 3 3 ... 3 3 3 3 3 3 3 3 3 3
CLAC1 2011-10-24 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
CMEA1 2012-01-05 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
CMEA2 2012-01-05 4 4 4 4 4 4 4 4 4 4 ... 4 4 4 4 4 4 4 4 4 4
CWFA1 2013-07-23 3 3 3 3 3 3 3 3 3 3 ... 3 3 3 3 3 3 3 3 3 3
DAVI1 2012-12-06 3 3 3 3 3 3 3 3 3 3 ... 3 3 3 3 3 3 3 3 3 3
DIXN1 2011-12-19 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
DIXN2 2011-12-27 3 3 3 3 3 3 3 3 3 3 ... 3 3 3 3 3 3 3 3 3 3
DIXN3 2011-12-27 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
DIXN4 2011-12-28 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
DIXN5 2012-01-28 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1
DIXN6 2012-01-28 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1
DIXN7 2012-01-31 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1
DIXN8 2012-02-01 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
DIXN9 2012-02-06 3 3 3 3 3 3 3 3 3 3 ... 3 3 3 3 3 3 3 3 3 3
DJAY1 2011-12-21 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
DPEN1 2012-01-12 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
EBFA1 2013-08-18 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
ESCU1 2012-12-04 3 3 3 3 3 3 3 3 3 3 ... 3 3 3 3 3 3 3 3 3 3
FAIR1 2013-06-06 3 3 3 3 3 3 3 3 3 3 ... 3 3 3 3 3 3 3 3 3 3
FFAI1 2011-10-31 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
FFAI2 2011-10-31 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
FLLP1 2012-07-01 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
FLLP2 2012-07-01 3 3 3 3 3 3 3 3 3 3 ... 3 3 3 3 3 3 3 3 3 3
FRNR1 2011-05-03 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
FRTZ1 2011-09-15 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
FRTZ2 2011-09-15 4 4 4 4 4 4 4 4 4 4 ... 4 4 4 4 4 4 4 4 4 4
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
SNED1 2012-01-04 4 4 4 4 4 4 4 4 4 4 ... 4 4 4 4 4 4 4 4 4 4
SNED2 2012-01-04 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
SWRC1 2013-08-12 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
TFIR1 2012-10-26 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
TGER1 2011-09-13 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
TGER2 2011-09-13 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
THOM1 2011-12-29 3 3 3 3 3 3 3 3 3 3 ... 3 3 3 3 3 3 3 3 3 3
TOMK1 2013-05-18 3 3 3 3 3 3 3 3 3 3 ... 3 3 3 3 3 3 3 3 3 3
TOMK2 2013-05-18 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
TOMK3 2013-05-20 3 3 3 3 3 3 3 3 3 3 ... 3 3 3 3 3 3 3 3 3 3
TOMK4 2013-05-21 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
TONA1 2011-10-09 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
TOTO2 2013-05-15 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
TOTO3 2013-05-15 3 3 3 3 3 3 3 3 3 3 ... 3 3 3 3 3 3 3 3 3 3
TSPG1 2011-10-21 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
TTRA1 2012-01-24 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
TTRA2 2012-01-24 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
TWTR1 2011-08-01 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
TWTR2 2011-08-02 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
TWTR3 2011-08-02 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
UORG1 2012-01-15 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
UORG2 2012-01-15 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
VPHE1 2012-03-17 3 3 3 3 3 3 3 3 3 3 ... 3 3 3 3 3 3 3 3 3 3
WDAV1 2011-12-23 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
WEFA1 2011-09-19 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
WFOR1 2011-12-21 3 3 3 3 3 3 3 3 3 3 ... 3 3 3 3 3 3 3 3 3 3
WGNR1 2011-11-01 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
WORK1 2011-01-17 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
WORK2 2011-01-18 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
WORK3 2011-01-18 3 3 3 3 3 3 3 3 3 3 ... 3 3 3 3 3 3 3 3 3 3

125 rows × 23 columns


In [31]:
photos.loc[photos.index[25],['type','note']]


Out[31]:
type note
id date
DIXN1 2011-12-19 photo north along transect; line of sight is to powe...
2011-12-19 photo south along transect

In [ ]:
#combine photo captions
df['caption']=''
df.loc[(df.type.str.contains('lineminus'))|(df.type.str.contains('lineplus')),['caption']]=df['type'] + ' | ' + df['note'] 
df.loc[df.type.str.contains('lineplus'),['caption']]=df['url']
df.loc[df.type.str.contains('misc'),['caption']]=df['url']

In [6]:
df['mystart'] = 'Baseline summary:'
df.loc[df.type =='transect',['site_description']]= df[['mystart','label1','value1','label2','value2','label3','value3','note']].apply(' | '.join, axis=1)
df.loc[df.type.str.contains('line-'),['linephoto1']]=df['url']
df.loc[df.type.str.contains('line\+'),['linephoto2']]=df['url']#be sure to escape the +
df.loc[df.type.str.contains('linephoto1'),['linephoto1']]=df['url']
df.loc[df.type.str.contains('linephoto2'),['linephoto2']]=df['url']

In [7]:
df.loc[df.type == 'plants',['general_observations']]=df['note']

use groupby and transform to fill the row


In [9]:
#since we're using string methods, NaNs won't work
mycols =['general_observations','mapPhoto','linephoto1','linephoto2','miscPhoto','site_description']
for item in mycols:
    df[item] = df[item].fillna('')

In [13]:
df.mapPhoto = df.groupby('id')['mapPhoto'].transform(lambda x: "%s" % ''.join(x))

In [14]:
df.linephoto1 = df.groupby(['id','date'])['linephoto1'].transform(lambda x: "%s" % ''.join(x))
df.linephoto2 = df.groupby(['id','date'])['linephoto2'].transform(lambda x: "%s" % ''.join(x))
df.miscPhoto = df.groupby(['id','date'])['miscPhoto'].transform(lambda x: "%s" % ''.join(x))

In [15]:
df['site_description'] = df['site_description'].str.strip()

In [16]:
df.to_csv('test.csv')
#done to here. Next, figure out what to do with linephotos, unclassified photos, and their notes.
#make column for photocaptions. When adding linephoto1, add 'note' and 'type' fields to caption column. E.g. 'linephoto1: 100line- | view east along transect.' Then join the rows in the groupby transform and add to site_description field.

In [172]:
df.shape


Out[172]:
(1643, 25)

In [171]:
df[(df.type.str.contains('line\+'))&(df.linephoto2.str.len()<50)]


Out[171]:
group id type site_description general_observations mapPhoto linephoto1 linephoto2 miscPhoto label1 ... label4 value4 label5 value5 date observers url note obs_type newurl

0 rows × 25 columns


In [56]:
maps.str.len().sort_values()


Out[56]:
1613     0
1610     0
1263     0
1258     0
505      0
502      0
495      0
499      0
1562     0
1547     0
430      1
450      1
236      1
240      1
1083    51
1092    51
1098    51
592     51
584     51
567     51
559     51
554     51
529     51
399     51
524     51
508     51
404     51
410     51
1075    51
415     51
        ..
2       56
16      56
4       56
275     56
271     56
281     56
279     56
277     56
273     56
269     56
267     56
174     56
0       56
168     56
163     56
261     57
459     57
285     57
283     57
80      57
74      57
98      57
18      57
20      57
22      57
92      57
68      57
86      57
1442    60
1145    65
Name: mapPhoto, dtype: int64

shift data to correct row using a multi-Index


In [280]:
ids = list(df['id'])#make a list of ids to iterate over, before the hierarchical index

In [281]:
#df.type = df.type.map({'\*plot summary':'transect','\*remonitoring notes':'transect'})

In [288]:
df.loc[df.type =='map',['mapPhoto']]=df['url'] #moving cell values to correct column

In [274]:
df.set_index(['id','type'],inplace=True) # hierarchical index so we can call locations

In [276]:
#a hierarchical index uses a tuple. You can set values using loc.
#this format: df.loc[destination] = df.loc[source].values[0]
for item in ids:
    df.loc[(item,'*plot summary'),'mapPhoto'] = df.loc[(item,'map'),'mapPhoto'].values[0]

#generates a pink warning about performance, but oh well.


C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py:1310: PerformanceWarning: indexing past lexsort depth may impact performance.
  return self._getitem_tuple(key)
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\kernelbase.py:390: PerformanceWarning: indexing past lexsort depth may impact performance.
  user_expressions, allow_stdin)
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\zmqshell.py:501: PerformanceWarning: indexing past lexsort depth may impact performance.
  return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)

In [10]:
#here we are using an expression in parens to test for a condition
(df['type'].str.contains('\s') & df['note'].notnull()).value_counts()


Out[10]:
False    24
True      6
dtype: int64

In [162]:
df.url = df.url.str.replace(' ','_');df.url


Out[162]:
0           tomk_antk_003.jpg
1           tomk_antk_001.jpg
2           tomk_antk_002.jpg
3           tomk_antk_003.jpg
4           tomk_ANTK1map.jpg
5           tomk_antk_007.jpg
6           tomk_antk_006.jpg
7           tomk_antk_007.jpg
8           tomk_ANTK2map.jpg
9         tomk_ANTK3line-.jpg
10        tomk_ANTK3line-.jpg
11        tomk_ANTK3line+.jpg
12          tomk_linemisc.jpg
13          tomk_ANTK3map.jpg
14     tomk_ANTK4-30line+.jpg
15     tomk_ANTK4-30line-.jpg
16     tomk_ANTK4-30line+.jpg
17    tomk_ANTK4-30line++.jpg
18          tomk_ANTK4map.jpg
19                        NaN
20     tomk_ANTK1-25line-.jpg
21     tomk_ANTK1-25line+.jpg
22                        NaN
23     tomk_ANTK2-30line-.jpg
24     tomk_ANTK2-30line+.jpg
25     avsp_may1-2011_028.jpg
26     avsp_may1-2011_029.jpg
27            avsp_AV1map.jpg
28     avsp_may1-2011_033.jpg
29     avsp_may1-2011_034.jpg
Name: url, dtype: object

In [151]:
df.url.head()


Out[151]:
0    http://soilcarboncoalition.org/data/tomk/antk ...
1    http://soilcarboncoalition.org/data/tomk/antk ...
2    http://soilcarboncoalition.org/data/tomk/antk ...
3    http://soilcarboncoalition.org/data/tomk/antk ...
4    http://soilcarboncoalition.org/data/tomk/ANTK1...
Name: url, dtype: object

In [ ]:
df['newurl'] = df.url.str.replace

In [130]:
df.newurl.head()


Out[130]:
0    tomk_antk_003.jpg
1    tomk_antk_001.jpg
2    tomk_antk_002.jpg
3    tomk_antk_003.jpg
4    tomk_ANTK1map.jpg
Name: newurl, dtype: object

In [ ]:
#for combining rows try something like this:
print(df.groupby('somecolumn')['temp variable'].apply(' '.join).reset_index())